Goto

Collaborating Authors

 output layer


Direct Feedback Alignment Provides Learning in Deep Neural Networks

Neural Information Processing Systems

Artificial neural networks are most commonly trained with the back-propagation algorithm, where the gradient for learning is provided by back-propagating the error, layer by layer, from the output layer to the hidden layers. A recently discovered method called feedback-alignment shows that the weights used for propagating the error backward don't have to be symmetric with the weights used for propagation the activation forward. In fact, random feedback weights work evenly well, because the network learns how to make the feedback useful. In this work, the feedback alignment principle is used for training hidden layers more independently from the rest of the network, and from a zero initial condition. The error is propagated through fixed random feedback connections directly from the output layer to each hidden layer. This simple method is able to achieve zero training error even in convolutional networks and very deep networks, completely without error back-propagation. The method is a step towards biologically plausible machine learning because the error signal is almost local, and no symmetric or reciprocal weights are required. Experiments show that the test performance on MNIST and CIFAR is almost as good as those obtained with back-propagation for fully connected networks. If combined with dropout, the method achieves 1.45% error on the permutation invariant MNIST task.



Heterogeneity-Guided Client Sampling: Towards Fast and Efficient Non-IID Federated Learning

Neural Information Processing Systems

This has motivated numerous studies aiming to reduce the variance and improve convergence of FL on non-IID data [6, 9, 14, 17, 19, 30]. On another note, constraints on communication resources and therefore on the number of clients that may participate in training additionally complicate implementation of FL schemes.



A Training and

Neural Information Processing Systems

All models were trained on single GPUs, except for SchNet when trained on OC20-2M, which required 3 GPUs. Tables 9-12 present the extended results on OC20 across the 4 separate S2EF validation sets. Table 9: Evaluation results on the OC20 S2EF in-distribution validation set. In Table 13, we present the performance and inference throughput of the baseline models on COLL. Table 13: Evaluation of the performance of the four baseline models on the COLL dataset.Inference COLL test set Throughput Samples / Energy MAE Force MAE Force cos EFwT Model GPU sec.






b7ae8fecf15b8b6c3c69eceae636d203-Paper.pdf

Neural Information Processing Systems

ItstangentkernelK(x,z)(w)isdefinedasfollows: K(x,z)(w):= wf(w;x)T wf(w;z), forfixedinputsx,z Rd. (1) The key finding of [10] was the fact that for some wide neural networks the kernelK(x,z)(w) is a constant function of the weightw during training. While in the literature, including [10], this phenomenon isdescribed interms ofthe(linear) training dynamics,itisimportant tonote thatthe tangent kernel is associated to the model itself.